A Meta-learning Method Based on Temporal Difference Error

نویسندگان

  • Kunikazu Kobayashi
  • Hiroyuki Mizoue
  • Takashi Kuremoto
  • Masanao Obayashi
چکیده

In general, meta-parameters in a reinforcement learning system, such as a learning rate and a discount rate, are empirically determined and fixed during learning. When an external environment is therefore changed, the sytem cannot adapt itself to the variation. Meanwhile, it is suggested that the biological brain might conduct reinforcement learning and adapt itself to the external environment by controlling neuromodulators corresponding to the meta-parameters. In the present paper, based on the above suggestion, a method to adjust metaparameters using a temporal difference (TD) error is proposed. Through various computer simulations using a maze search problem and an inverted pendulum control problem, it is verified that the proposed method could appropriately adjust meta-parameters according to the variation of the external environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Control of Multivariable Systems Based on Emotional Temporal Difference Learning Controller

One of the most important issues that we face in controlling delayed systems and non-minimum phase systems is to fulfill objective orientations simultaneously and in the best way possible. In this paper proposing a new method, an objective orientation is presented for controlling multi-objective systems. The principles of this method is based an emotional temporal difference learning, and has a...

متن کامل

Basis Function Adaptation in Temporal Difference Reinforcement Learning

We examine methods for on-line optimization of the basis function for temporal difference Reinforcement Learning algorithms. We concentrate on architectures with a linear parameterization of the value function. Our methods optimize the weights of the network while simultaneously adapting the parameters of the basis functions in order to decrease the Bellman approximation error. A gradient-based...

متن کامل

Error Bounds in Reinforcement Learning Policy Evaluation

With the advent of Kearns & Singh’s (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound. We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo m...

متن کامل

The effect of localizing the design and production of Meta-text based e-book on the levels of student learning and retention

Introduction: Application of ability and capabilities of modern educational technology is an opportunity to achieve effective and optimal learning. Also, localization is the consideration of indigenous knowledge in order to accumulate global knowledge of local needs and desires. In this research, with the native approach and according to the existing need, first In this research, the e-book، d...

متن کامل

Cystoscopy Image Classication Using Deep Convolutional Neural Networks

In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009